Bootstrap Methods for the Cost - Sensitive Evaluation of Classi ersDragos

نویسندگان

  • Dragos D. Margineantu
  • Thomas G. Dietterich
چکیده

Many machine learning applications require classiiers that minimize an asymmetric cost function rather than the misclassiication rate, and several recent papers have addressed this problem. However, these papers have either applied no statistical testing or have applied statistical methods that are not appropriate for the cost-sensitive setting. Without good statistical methods, it is dii-cult to tell whether these new cost-sensitive methods are better than existing methods that ignore costs, and it is also diicult to tell whether one cost-sensitive method is better than another. To rectify this problem, this paper presents two statistical methods for the cost-sensitive setting. The rst constructs a conndence interval for the expected cost of a single classiier. The second constructs a con-dence interval for the expected diierence in costs of two classiiers. In both cases, the basic idea is to separate the problem of estimating the probabilities of each cell in the confusion matrix (which is independent of the cost matrix) from the problem of computing the expected cost. We show experimentally that these bootstrap tests work better than applying standard z tests based on the normal distribution.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Bootstrap Methods for the Cost-Sensitive Evaluation of Classifiers

Many machine learning applications require classi ers that minimize an asymmetric cost function rather than the misclassi cation rate, and several recent papers have addressed this problem. However, these papers have either applied no statistical testing or have applied statistical methods that are not appropriate for the cost-sensitive setting. Without good statistical methods, it is di cult t...

متن کامل

Sensitive Error Correcting Output Codes

Sensitive error correcting output codes are a reduction from cost sensitive classi cation to binary classi cation. They are a modi cation of error correcting output codes [3] which satisfy an additional property: regret for binary classi cation implies at most 2 l2 regret for cost-estimation. This has several implications: 1) Any 0/1 regret minimizing online algorithm is (via the reduction) a r...

متن کامل

Sensitive Error Correcting Output Codes

We present a reduction from cost sensitive classi cation to binary classi cation based on (a modi cation of) error correcting output codes. The reduction satis es the property that regret for binary classi cation implies l2-regret of at most 2 for cost-estimation. This has several implications: 1) Any regret-minimizing online algorithm for 0/1 loss is (via the reduction) a regret-minimizing onl...

متن کامل

The Effect of Observation Data Sampling Methods on Infiltration Areas by Maximum Entropy Model

Statistical modeling methods are based on multivariate regression methods and require the presence and absence location of data for the construction of the model. In most cases, there is no trustworthy absence data. Therefore, other methods that are based only on the presence of the phenomenon are used. Considering the importance of modeling - saving time and cost and the probable prediction of...

متن کامل

Evaluation of Palm Groves Technical Efficiency Using Bootstrap Data Envelopment Analysis: A Case Study of Roodkhanehbar Area, Iran

Roodkhnehbar area, having approximately 111 thousands of Keriteh palm trees, is one of the most important areas of date production in the Rudan County[1]and the source of peoples’ income in this area, directly or indirectly. As a result, its production efficiency has a critical importance to the orchardists in this region. This study aims to evaluate technical efficiency of palm groves in this ...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2000